Flexible collective communication tuning architecture applied to Open MPI
نویسندگان
چکیده
Collective communications are invaluable to modern high performance applications, although most users of these communication patterns do not always want to know their inner most working. The implementation of the collectives are often left to the middle-ware developer such as those providing an MPI library. As many of these libraries are designed to be both generic and portable the MPI developers commonly offer internal tuning options suitable only for knowledgeable users that allow some level of customization. The work presented in this paper aims not only to provide a very efficient set of collective operations for use with the Open MPI implementation but also to make the control and tuning of them straightforward and flexible. Additionally this paper demonstrates a novel example of the proposed frameworks flexibility, by dynamically tuning a MPI Alltoallv algorithm during runtime.
منابع مشابه
Tuning MPI Collectives by Verifying Performance Guidelines
ABSTRACT MPI collective operations provide a standardized interface for performing data movements within a group of processes. The e ciency of collective communication operations depends on the actual algorithm, its implementation, and the speci c communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for speci c collecti...
متن کاملAdaptive Selection of Communication Methods to Optimize Collective MPI Operations
Many parallel applications from scientific computing use collective MPI communication operations to distribute or collect data. The execution time of collective MPI communication operations can be significantly reduced by a restructuring based on orthogonal processor structures or by using specific point-topoint algorithms based on virtual communication topologies. The performance improvement d...
متن کاملOptimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This couple...
متن کاملPersonalized MPI library for Exascale Applications and Environments
Minimizing the communication costs associated with a parallel application is a key challenge for the scalability of petascale and future exascale application. This paper introduces the notion of a personalized MPI library that is customized for a particular application and platform. The work is based on the Open MPI communication library, which has a large number of runtime parameters that can ...
متن کاملImplementing a Hardware-Based Barrier in Open MPI
Open MPI is a recent open source development project which combines features of different MPI implementations. These features include fault tolerance, multi network support, grid support and a component architecture which ensures extensibility. The TUC Hardware Barrier is a special purpose low latency barrier network based on commodity hardware. We show that the Open MPI collective framework ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006